Goto

Collaborating Authors

 Containers & Packaging


T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Neural Information Processing Systems

Despite the stunning ability to generate high-quality images by recent text-toimage models, current approaches often struggle to effectively compose objects with different attributes and relationships into a complex and coherent scene. We propose T2I-CompBench, a comprehensive benchmark for open-world compositional text-to-image generation, consisting of 6,000 compositional text prompts from 3 categories (attribute binding, object relationships, and complex compositions) and 6 sub-categories (color binding, shape binding, texture binding, spatial relationships, non-spatial relationships, and complex compositions). We further propose several evaluation metrics specifically designed to evaluate compositional text-to-image generation and explore the potential and limitations of multimodal LLMs for evaluation. We introduce a new approach, Generative mOdel finetuning with Reward-driven Sample selection (GORS), to boost the compositional text-to-image generation abilities of pretrained text-to-image models. Extensive experiments and evaluations are conducted to benchmark previous methods on T2I-CompBench, and to validate the effectiveness of our proposed evaluation metrics and GORS approach.


Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback

arXiv.org Artificial Intelligence

Large text-to-video models hold immense potential for a wide range of downstream applications. However, these models struggle to accurately depict dynamic object interactions, often resulting in unrealistic movements and frequent violations of real-world physics. One solution inspired by large language models is to align generated outputs with desired outcomes using external feedback. This enables the model to refine its responses autonomously, eliminating extensive manual data collection. In this work, we investigate the use of feedback to enhance the object dynamics in text-to-video models. We aim to answer a critical question: what types of feedback, paired with which specific self-improvement algorithms, can most effectively improve text-video alignment and realistic object interactions? We begin by deriving a unified probabilistic objective for offline RL finetuning of text-to-video models. This perspective highlights how design elements in existing algorithms like KL regularization and policy projection emerge as specific choices within a unified framework. We then use derived methods to optimize a set of text-video alignment metrics (e.g., CLIP scores, optical flow), but notice that they often fail to align with human perceptions of generation quality. To address this limitation, we propose leveraging vision-language models to provide more nuanced feedback specifically tailored to object dynamics in videos. Our experiments demonstrate that our method can effectively optimize a wide variety of rewards, with binary AI feedback driving the most significant improvements in video quality for dynamic interactions, as confirmed by both AI and human evaluations. Notably, we observe substantial gains when using reward signals derived from AI feedback, particularly in scenarios involving complex interactions between multiple objects and realistic depictions of objects falling.


T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Neural Information Processing Systems

Despite the stunning ability to generate high-quality images by recent text-toimage models, current approaches often struggle to effectively compose objects with different attributes and relationships into a complex and coherent scene. We propose T2I-CompBench, a comprehensive benchmark for open-world compositional text-to-image generation, consisting of 6,000 compositional text prompts from 3 categories (attribute binding, object relationships, and complex compositions) and 6 sub-categories (color binding, shape binding, texture binding, spatial relationships, non-spatial relationships, and complex compositions). We further propose several evaluation metrics specifically designed to evaluate compositional text-to-image generation and explore the potential and limitations of multimodal LLMs for evaluation. We introduce a new approach, Generative mOdel finetuning with Reward-driven Sample selection (GORS), to boost the compositional text-to-image generation abilities of pretrained text-to-image models. Extensive experiments and evaluations are conducted to benchmark previous methods on T2I-CompBench, and to validate the effectiveness of our proposed evaluation metrics and GORS approach.


Machine Learning in Industrial Quality Control of Glass Bottle Prints

arXiv.org Artificial Intelligence

In industrial manufacturing of glass bottles, quality control of bottle prints is necessary as numerous factors can negatively affect the printing process. Even minor defects in the bottle prints must be detected despite reflections in the glass or manufacturing-related deviations. In cooperation with our medium-sized industrial partner, two ML-based approaches for quality control of these bottle prints were developed and evaluated, which can also be used in this challenging scenario. Our first approach utilized different filters to supress reflections (e.g. Sobel or Canny) and image quality metrics for image comparison (e.g. MSE or SSIM) as features for different supervised classification models (e.g. SVM or k-Neighbors), which resulted in an accuracy of 84%. The images were aligned based on the ORB algorithm, which allowed us to estimate the rotations of the prints, which may serve as an indicator for anomalies in the manufacturing process. In our second approach, we fine-tuned different pre-trained CNN models (e.g. ResNet or VGG) for binary classification, which resulted in an accuracy of 87%. Utilizing Grad-Cam on our fine-tuned ResNet-34, we were able to localize and visualize frequently defective bottle print regions. This method allowed us to provide insights that could be used to optimize the actual manufacturing process. This paper also describes our general approach and the challenges we encountered in practice with data collection during ongoing production, unsupervised preselection, and labeling.


The Future of Recycling Is Sorty McSortface

The Atlantic - Technology

At the Boulder County Recycling Center in Colorado, two team members spend all day pulling items from a conveyor belt covered in junk collected from the area's bins. One plucks out juice cartons and plastic bottles that can be reprocessed, while the other searches for contaminants in the stream of paper products headed to a fiber mill. They are Sorty McSortface and Sir Sorts-a-Lot, AI-powered robots that each resemble a supercharged mechanical arm from an arcade claw machine. Developed by the tech start-up Amp Robotics, McSortface and Sorts-a-Lot's appendages dart down with the speed of long-beaked cranes picking fish out of the water, suctioning up items they've been trained to recognize. Yes, even recycling has gotten tangled up in the AI revolution. Amp Robotics has its tech in nearly 80 facilities across the U.S., according to a company spokesperson, and in recent years, AI-powered sorting from companies such as Bulk Handling Systems and MachineX has popped up in other recycling plants.


A look inside the lab building mushroom computers

#artificialintelligence

Upon first glance, the Unconventional Computing Laboratory looks like a regular workspace, with computers and scientific instruments lining its clean, smooth countertops. But if you look closely, the anomalies start appearing. A series of videos shared with PopSci show the weird quirks of this research: On top of the cluttered desks, there are large plastic containers with electrodes sticking out of a foam-like substance, and a massive motherboard with tiny oyster mushrooms growing on top of it. No, this lab isn't trying to recreate scenes from "The Last of Us." The researchers there have been working on stuff like this for awhile: It was founded in 2001 with the belief that the computers of the coming century will be made of chemical or living systems, or wetware, that are going to work in harmony with hardware and software.


How Artificial Intelligence Is Revolutionizing the Packaging Industry? - The Data Scientist

#artificialintelligence

Artificial Intelligence is shaping how businesses work and enhancing their capacity to thrive smartly. In recent years we have seen many awe-inspiring developments and super useful too. AI is working in almost every industry, such as food, cosmetics, wood, medicine, etc.; we know that every business requires packaging for their products, which defines the value of the packaging manufacturing industry. Keeping this in mind, AI is playing an impressive role in the advancement of the packaging industry too. Artificial intelligence is transforming the way the packaging industry is working.


OpenPack: A Large-scale Dataset for Recognizing Packaging Works in IoT-enabled Logistic Environments

arXiv.org Artificial Intelligence

Unlike human daily activities, existing publicly available sensor datasets for work activity recognition in industrial domains are limited by difficulties in collecting realistic data as close collaboration with industrial sites is required. This also limits research on and development of AI methods for industrial applications. To address these challenges and contribute to research on machine recognition of work activities in industrial domains, in this study, we introduce a new large-scale dataset for packaging work recognition called OpenPack. OpenPack contains 53.8 hours of multimodal sensor data, including keypoints, depth images, acceleration data, and readings from IoT-enabled devices (e.g., handheld barcode scanners used in work procedures), collected from 16 distinct subjects with different levels of packaging work experience. On the basis of this dataset, we propose a neural network model designed to recognize work activities, which efficiently fuses sensor data and readings from IoT-enabled devices by processing them within different streams in a ladder-shaped architecture, and the experiment showed the effectiveness of the architecture. We believe that OpenPack will contribute to the community of action/activity recognition with sensors. OpenPack dataset is available at https://open-pack.github.io/.


How automation and artificial intelligence could impact the packaging industry

#artificialintelligence

AMP Robotics uses automation and artificial intelligence (AI) to sort materials within waste streams. We asked CEO Matanya Horowitz how this works, how data can be utilised and the ways automated sorting could impact the packaging industry. AMP Robotics developed an AI platform (AMP Neuron) to distinguish recyclable materials from waste. How did you come up with the idea? Ever since I was a child I've been interested in robotics and the origins of intelligence.


AI-Powered 'Smart Bin' Sorts Recycling

#artificialintelligence

A prototype "smart bin" developed by researchers at Australia's University of Technology, Sydney (UTS) can sort recyclable materials automatically through a combination of artificial intelligence (AI), robotics, and machine vision. UTS' Xu Wang said the system can categorize different types of waste such as glass bottles, metal cans, and several varieties of plastic. "We have a camera and we're running an AI algorithm to classify different types of plastics and then we use IoT [Internet of Things] and other robotics technology to sort the waste into the bins," Wang explained. The researchers envision smart bins deployed in shopping centers, schools, cinemas, businesses, and airports.